Approximating Personalized PageRank with Minimal Use of Web Graph Data
نویسندگان
چکیده
In this paper, we consider the problem of calculating fast and accurate approximations to the personalized PageRank score ([8, 16]) of a webpage. We focus on techniques to improve speed by limiting the amount of webgraph data we need to access. PageRank scores are mainly used for ranking purposes, and generally only the scores exceeding a given threshold are relevant. In practice, and relative to the size of the web, only a small number of pages have a non-negligible personalized PageRank score. We capitalize on this property of the PageRank score to reduce the amount of webgraph data needed for computation. Our algorithms provide both the approximation to the personalized PageRank score as well as guidance in using only the necessary information — and therefore sensibly reduce not only the computational cost of the algorithm, but also the memory and memory bandwidth requirements. Our algorithms assume that we have random access to a sparse representation of a webgraph (perhaps by crawling the web if necessary); some of them further require knowledge of a hostgraph ([19]). All of them provide a fast approximate calculation of the personalized PageRank score for pages where the score exceeds a given threshold. We report experiments with these algorithms on webgraphs of up to 118 million pages and prove theoretical approximation bound for all. We conclude by proposing an application scenario of personalized Web Search that inspired and motivated our work.
منابع مشابه
Community Detection Using Time-Dependent Personalized PageRank
Local graph diffusions have proven to be valuable tools for solving various graph clustering problems. As such, there has been much interest recently in efficient local algorithms for computing them. We present an efficient local algorithm for approximating a graph diffusion that generalizes both the celebrated personalized PageRank and its recent competitor/companion the heat kernel. Our algor...
متن کاملComputing Personalized PageRank Quickly by Exploiting Graph Structures
We propose a new scalable algorithm that can compute Personalized PageRank (PPR) very quickly. The Power method is a state-of-the-art algorithm for computing exact PPR; however, it requires many iterations. Thus reducing the number of iterations is the main challenge. We achieve this by exploiting graph structures of web graphs and social networks. The convergence of our algorithm is very fast....
متن کاملAn Overview of Efficient Computation of PageRank
With the rapid growth of the Web, users get easily lost in the rich hyper structure. Providing relevant information to the users to cater to their needs is the primary goal of website owners. Therefore, finding the content of the Web and retrieving the users’ interests and needs from their behavior have become increasingly important. Web mining is used to categorize users and pages by analyzing...
متن کاملPersonalized PageRank with Node-Dependent Restart
Personalized PageRank is an algorithm to classify the improtance of web pages on a user-dependent basis. We introduce two generalizations of Personalized PageRank with nodedependent restart. The first generalization is based on the proportion of visits to nodes before the restart, whereas the second generalization is based on the probability of visited node just before the restart. In the origi...
متن کاملA Survey on PageRank Computing
This survey reviews the research related to PageRank computing. Components of a PageRank vector serve as authority weights for web pages independent of their textual content, solely based on the hyperlink structure of the web. PageRank is typically used as a web search ranking component. This defines the importance of the model and the data structures that underly PageRank processing. Computing...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Internet Mathematics
دوره 3 شماره
صفحات -
تاریخ انتشار 2007